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ABSTRACT 


In this paper, we present multiple bit error correction coding scheme based 
on extended Hamming product code combined with type II HARQ using 
shared resources for on chip interconnect. The shared resources reduce 
the hardware complexity of the encoder and decoder compared to 
the existing three stages iterative decoding method for on chip interconnects. 


The proposed method of decoding achieves 20% and 28% reduction in area 

and power consumption respectively, with only small increase in decoder 
Keywords: delay compared to the existing three stage iterative decoding scheme for 
multiple bit error correction. The proposed code also achieves excellent 
improvement in residual flit error rate and up to 58% of total power 
consumption compared to the other error control schemes. The low 
complexity and excellent residual flit error rate make the proposed code 
suitable for on chip interconnection links. 
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1. INTRODUCTION 

Interconnection of all processing elements (PEs) or intellectual property (IP) cores on a single chip 
by employing traditional on-chip communication infrastructure, like shared buses or multi-layer buses, 
results in issues with scalability and IP reusability. This motivates system on chip architects to shift to 
network on chip (NoC). NoC provides scalable and high bandwidth communication infrastructure to various 
multi-core and many-core architectures [1-3]. In very deep submicron technology (VDSM), on chip 
interconnect errors are caused by different effects like supply voltage fluctuation, electromagnetic 
interference (EMI), process variation and crosstalk [4-9]. Reliability can be improved by applying error 
control techniques, such as automatic repeat request (ARQ), forward error correction (FEC), and hybrid ARQ 
(HARQ) to on-chip interconnects [10-14]. These three error correction techniques have different error 
correction capability and different hardware complexity. In [12,13], the techniques were able to correct single 
bit or two-bit random errors only. But, the probabilities of occurrence of multiple random and burst errors are 
getting higher [14], which urged the need for more powerful coding techniques. The combintion of crosstalk 
avoidance code with error control code can improve the error correction capability [15]. In [16], the use of 
simple parity calculation along with message triplication can achieve two random error correction and some 
of three. In joint crosstalk avoidance and triple error correction (JTEC) [17], the correction capability was 
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increased by using Hamming code with message duplication to correct three errors. Further optimization was 
applied for this scheme for triple error correction and quadruple error detection (JTEC-SQED) [17]. In joint 
crosstalk aware multiple error correction (JMEC) the use of changed interleaving distance between adjacent 
bits made the correction of nine adjacent errors possible [18]. Duplication with two dimensional parities was 
also proposed to provide up to seven errors detection [19] or six errors detection and single error correction [20]. 
In multi bit random and burst error correction (MBRBEC) five errors correction was possible by using 
extended Hamming code with messge triplication [21]. Quintuplicated manchester error correction (QMEC) 
achieved nonuple errors correction [22]. All the duplication, triplication and quintuplication based coding 
schemes allowed high error correction due to the high redundancy which is translated into high link size. 

Hamming product codes [23] are used to correct both multiple random and burst errors without high 
link size overhead. In [24-27] the authors use three-stage decoding circuit for Hamming product codes with 
type II HARQ to achieve a good correction capability (up to five random errors and burst errors). The only 
drawback of this approach is the use of complex design of three stage circuit. In [28], the authors designed 
two stage row-column decoding and keyboard scan-based error flipping instead of the three-stage decoding 
design proposed in [24-26]. However, the reduction in decoder area was at the cost of correction capability 
because it can correct less random bits and four burst error only. In [29], the authors used the same principle 
of hamming product code but with different arrangement. They used extended hamming on rows and simple 
parities on columns so that they can reduce the circuit complexity and parity bit size. The smaller message 
size allowed them to send the message at once, without the use of ARQ technique, but at the same time there 
is a huge drop in correction capability compared with work in [24-26] as will be reanalyzed later in this 
paper. The proposed design in this paper follows the same target authors followed in [28, 29] towards 
the reduction of circuit compexiy in [24-26]. The proposed design differs by not sacrificing the correction 
capability to gain the (area/power) savings. The proposed coding makes use of the concept of resource 
sharing and used it on the traditional three stage hamming product code decoding. The proposed circuit has 
the same correction capability of three stage hamming method with less area and power consumption. 


2. EXTENDED HAMMING PRODUCT CODE WITH TYPE IIT HARQ 

The input flit (k) is arranged into a matrix (kı x k2), as shown in Figure 1. Row parity check bits 
are obtained by encoding the (k,) bits in each row using (n4, kı) extended Hamming row encoder, where n; 
is the row encoded word. Column parity check bits are obtained by encoding the (k2) column bits using 
(n>, k2) extended Hamming column encoder, where n, is the column encoded word. Checks-on-checks can 
be generated by encoding the column parity check bits using row encoder. In [24], the authors used extended 
Hamming Product with type H HARQ Code to reduce the number of interconnection links. 
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2.1. The design of encoder 

The encoding process of Hamming product codes with type-II HARQ in [24-26] is shown in 
Figure 2(a). K-bit input message is encoded using row and column encoders. Extended Hamming codes EH 
(nı, kı) are used for row encoding and EH (n2, k2) are used for column encoding. All the output of row 
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encoders will pass through the row—column interleaver. Then the interleaved output will be sent through the 
link to decoders. The column encoders outputs are saved in a buffer, if there is a NACK received the saved 
parities will pass through second row encoders to generate checks-on-checks. The checks-on-checks 
and column parity check bits will be fed into row-column interleaver and sent to the decoder. 

The proposed design shown in Figure 2(b). exploits the similarities between the first circuit (row 
encoders) and the second circuit (column encoders) and combines them into one circuit (general encoder). 
In addition, the new proposed general encoder is responsible for the checks on check calculation, eliminating 
the need for the third circuit (second row encoders). The proposed design makes use of study done in [24-26] 
work that shows the best arrangement for hamming product code to get minimum link size is to have always 
four row encoders whatever is the message size. So, the column encoders are always in the size of four bits 
extended hamming circuit. If we have (k,) column encoders of size four input, we can combine (M) of them 
to form one row encoder. The number of column encoders M to form one row encoder can be written as: 


M = (K/K2)/4 (1) 


Figure 3 shows the internal design of the general encoder where each (M) column encoders output 
will be input to one Column-to-Row circuit. This circuit takes (n2-k,) parity bits from (M) column encoder 
and Xor them to form one set of (n,-k,) parity bits. The proposed encoder first encodes k input data using 
row encoder and saves a copy of the original data in a buffer. When the NACK is received the saved data will 
be encoded using column encoder and sent back to calculate Check on Checks using row encoders. 
As mentioned earlier, Check on Checks calculation circuit works same as row encoders in [24-26] so 
the third circuit was replaced using only the general encoder to work as row encoder. The buffer size in 
the proposed encoder is the same as in the previous work of [24-26]. As was mentioned earlier, the authors 
used k»=4 and the resultant check bits of extended Hamming code for four bits data EH (8, 4) is also four 
bits. This makes ((n, - ky) x k,) bits saved in the buffer in their work is the same size to k bit saved in our 
encoder. It should be noticed that in the proposed design two clock cycles are consumed in the encoder to 
calculate the column checks and checks on checks after NACK. This is one clock cycle more than that 
in [24-26]; a small penalty to pay for the gained power savings since the proposed encoder performs column 
encoding only after NACK is recieved. 
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Figure 2. (a) (fu2009) encoder [24], (b) Proposed encoder 


2.2. The design of decoder 

The decoding process of Hamming product codes with type-II HARQ in [24-26] is shown in 
Figure 4(a). The encoded data is applied to the extended Hamming row decoder for decoding. The row 
decoder corrects any single error that occurs in each row. If the errors are detectable but not correctable 
NACK signal is sent back to the encoder after storing the row decoded data and row parity bits in a buffer. At 
the same time, the row condition vector (RCV) is formed, which contains information about each row. When 
there is no error or one error in row the RCV will indicate zero, otherwise if there are double errors the RCV 
will indicate one in the corresponding RCV bit. When the column parity check bits and checks on checks are 
received, they are first passed through the row decoder to correct any error. The previous row decoded data 
stored in the buffer is row-column interleaved. The column parity check bits and row column interleaved data 
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are passed to the second stage column decoding. The column decoder corrects any single error that occurs 
in each column. The column condition vector (CCV) is formed. Then the output is sent to the third stage row 
decoders to form a new RCV after the column decoder correction is done in the second stage. The third stage 
decoders contain a simple flipping circuit used in [24-26] to correct rectangular errors using CCV and newly 
formed RCV. In the proposed decoder shown in Figure 4(b), the third stage row decoders are removed and 
the row-decoder in first stage is shared to perform the row decoding in the first and third stages. The flipping 
circuit is separated to be used only in the third stage of decoding. The proposed work works same as the 
previous circuit in [24-26] but instead of using the third stage we used the feedback to do third row decoding 
and flipping circuit to correct rectangular errors. 
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Figure 4. (a) (fu2009) decoder [24], (b) Proposed decoder 


3. RESULTS AND ANALYSIS 

In this section, reliability analysis for Hamming product code with Type IIT HARQ and MECCRLB 
code is present, then a comparison between them is carried out in terms of error correction capability, 
link swing voltage, link power consumption, codec area, codec power and codec delay. 


3.1. Reliability analysis 


The reliability is measured by the residual error probability P,esiduaı Which represents the probability 
of decoder error or failure [30-32]. So, Presiduai can be express as: 
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Presidauat = 1- Pod (2) 


where Pais the probability of proper decoding which is the sum of probabilities of correcting random errors 
and burst errors. Random errors will be considered only since they are the most frequent and for the purpose 
of simplicity. 


3.1.1. Extended hamming product code with type II HARQ 
Presiduai depends on both the error detection capability in the first transmission and error correction 
capability after the retransmission. Pyegiqual 1S estimated as given in [24] as: 


Presidual = Pud 1 Pl ejecodings Cdetect) (3) 


where Paqa is the undetectable error probability in the first transmission and P(€gecoding: Caetect) 
is the probability of error after retransmission and three stage decoding is over. Pq can be expressed as: 


PaP Lsa t PREP) (4) 


where P,e is the probability of no error and P.,P, are the probability of correctable error patterns, 
and the probability of detectable but uncorrectable error patterns in the first retransmission, respectively. 
P(€decoding» detect ) can be expressed as: 


P(edecoding» Edetect) = Pq- (1 — fie P) (5) 
By inserting (4) and (5) in (3) we get: 
Presidual m LS (1 F Pg). (Pre T P,) (6) 


Since any error pattern with single error in one row and the others in different row can be corrected in the 
first transmission, Pc for random errors can be written as: 


P= > (nie (7) 
C= = t 1 


where k» is the number of rows, n; is the row bit size, and € is the bit error rate. After retransmission, 
the proposed work can correct five random errors so Pg for random errors is defined in (8) as given in [24]. 
The first term is the error detection probability when two or three random errors occur in the first 
transmission. The second and third terms in (8) are the error detection probability of four and five 
random etrors. 


3 (Ke \n fk, Da), 
EE » J 
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+ 
Wik J UWI 
The probability of no error in first transmission can be expressed as in [24] 
Pre = (1— €)™" (9) 


If we substitute (7-9) in (6) we can easily find Presiguat: 
3.1.2. MECCRLB 
The work in [29], is an FEC-based coding scheme, where there is no retransmission available. 


As a result, P esiqua depends on the coding correction capability. 


Pesda m tep (10) 
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The authors in [26] indicate that MECCRLB can correct up to 11 random errors in 32-bit message so P, 
for random errors was expressed in [26] as: 


P. = (A (11) 

However, that equation is not accurate as it is not possible to correct all 11 random errors applying 
the MECCRLB decoding [8]. Figure 5 shows some cases where MECCRLB fails to correct two, three, 
and four errors. Instead, (12) can express the correction capability for up to four random errors, 
where the first term represents single error correction in different rows while the second term expresses 
double errors correction in message except if one of errors happens in 10-bits parity checks. Term three 
expresses the correction of three errors in message except if one or two of errors happen in 10-bits parity 
checks. Term four expresses the correction of four errors in three cases; first case when all four errors are in 
one row, the second case when three errors are in one row and the other in any other message bits, 
and the third case when two errors are in one row and the other two errors elsewhere in the message. If we 
substitute (12) in (10) we can find P,esigaaq, for MECCRLB for random errors. 
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Figure 5. Examples of failure cases for work in [29] in case of two, three and four errors 
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Now, after getting the equation for P esiduaı for both Extended Hamming Product Code with type IH 
HARQ and MECCRLB and for 32-bits message size, the equations become a function of £. Figure 6 shows 
P-esidual 1 estimation and simulation for different € values. A C++ program was developed to simulate 
the two techniques and random errors are generated at different error rates as shown in Figure 6. The results 
show that the estimated residual flit error rate is close to the simulated results. It can also be noticed by 
looking to the Y-axis that for the same values of P esiquaı HPC can sustain a higher bit error rate. However, 
in [33] more detailed comparsion was made between the two schemes which shows the superiority of HPC in 
terms of reliability. 
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Figure 6. Pyesidua for different bit error rates 


3.2. Link swing voltage 
On-chip communication errors can be attributed to voltage perturbations induced by noise from 
many sources. The error probability of a single wire can be modeled by a Gaussian pulse function [34]. 


_ Vswing\ _ [%2 1 -y*/2q 

e= OÇ) = Fung 0» 
ON 

where Vswing 18 the link swing voltage and ø is the standard deviation of the noise voltage, which is assumed 

to be a normal distribution. Therefore, adoption of highly reliable error correcting coding technique in NoC 

[11], results in reduction of link swing voltage from (7): 


Vswing = 20yQ-*(é') (14) 


where o"e’) is the inverse Gaussian function and é is the value at which P,esiduai(€') is equal to the probability 
of maximum permissible residual error. Figure 7 compares the link swing voltage for different error control 
schemes where dy is assumed as 0.1V. The Hamming product code achieves lower swing voltage compared 
to MECCRLB. The link power consumption Pw; is related to the interconnect capacitance Cz, the wire 
switching factor a, the link width Wz, the link swing voltage Vswing and clock frequency for. The link power 
Pw; can be expressed as: 


Pw; = Cra W, V sving feik (15) 


where œ is assumed as 0.1 and Wz; depends on the error control schemes. The link swing voltage Vswing 
depends on the reliability requirement of different error control schemes according to (14). For the given 
reliability requirement, the error control codes with low error correction capability need a higher link swing 
voltage than the error control scheme with high error correction capability. Figure 8 shows the link power 
consumption for different error control schemes for two given reliability requirements, namely P,esidual 
of 10°” and 10°. The power consumption is estimated for 45nm technology. The wire capacitance C,, is 
assumed as 208 fF/mm and the clock frequency is 500 MHz. Because of the higher detection capability of the 
Hamming product coding scheme, it uses low swing voltage that results in low link power consumption as 
compared to MECCRLB code. 
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Figure 8. Link power consumption of different error 


Figure 7. Link swing voltage control schemes 


3.3. Area, power and delay 

The area and delay of the proposed error control scheme and other error control schemes are shown 
in Table 1. All three were developed in Verilog HDL and functionally verified in ModelSim. Then they were 
synthesized in SDC using Nandgate 45nm library. Table 1 shows that the proposed implementation 
of Hamming product code (52, 32) consumes lower area by 20% compared to the HPC in [24] since 
the proposed scheme combines three circuits (row, column, row) encoders in one general encoder at 
the encoder side and also uses one shared row decoder instead of two row decoders at the decoder side. 
The proposed work has a higher area than MECCRLB [29], which represents the cost for its more powerful 
code in terms of error correction. The table also shows the encoder and decoder critical path delays for 
the three schemes, and as expected, MECCRLB achieves the lowest delay due to its lower complexity 
circuits (encoder, decoder). The proposed coding scheme introduces a slight increase in the decoder delay as 
compared to HPC in [24] due to its additional multiplexing circuit with the shared row decoder. In pipelined 
operation, where the encoder and decoder are separate pipeline stages, the maximum frequency is limited by 
the slowest stage which is the decoder for all the schemes. 

Accordingly, MECCRLB, Hamming product code, and the proposed coding achieve a maximum 
frequency of 1.1, 1, and 0.9 GHz respectively. Figure 9(a) and (b) shows the link and codec power at 
500 MHz frequency, with two values for Pyesidyuaq; 10° and 10°’. From Figure 9(a) it can be noticed that 
the MECCRLB consumes less codec power, but higher link power compared to other two works. 
That is because the authors used simple coding circuits that consume less codec power at the cost of lower 
correction capability that makes the voltage swing higher. The higher voltage swing is translated to higher 
link power consumption as the power consumption of the link is directly proportional to the square of the link 
swing voltage. Both, the proposed coding scheme and that in [24], have the same link power since they use 
the same correction technique. But the proposed work reduces the codec power by 28% by its optimized 
encoding and decoding circuits. Figure 9(b) shows (link and codec power) with Pyesiauqi=10. The codec 
power is not affected and it is the same as in (a), but there is an increase in link power for the three schemes 
since the increase in the target reliability (lower Pyesiguqi) Will increase the voltage swing accordingly, which 
in turn increases the link power. It should be noted that even in lower Presiduai the hamming product code used 
in [24] and in our work results in lower link power consumption as compared to MECCRLB due to higher 
correction capability. 


Table 1. Implementation results 


Error correction code ee ey 
(um?) Encoder Decoder 
MECCRLB 744 0.5 0.9 
Hamming product code with Type I HARQ 3574 0.8 1.0 
Proposed 2850 0.8 1.1 
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4. CONCLUSION 

This paper presented a lightweight realization of the Hamming product code with type II HARQ 
which is capable of correcting 100% of error patterns that have five errors (in full message transmission). 
The proposed code can also correct burst errors of up to 16 bits or a combination of random and burst errors. 
The resource sharing technique used reduced the area by 20% and the power by 28% with only a slight 
increase in the decoder delay. Because of the high error correction capability of the proposed error control 
code, it achieved low swing voltage, which resulted in low link power consumption. The low swing voltage 
resulted in the reduction of the total power consumption by up to 58% compared to other error control codes. 
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